Skip to content

Conversation

@sanderegg
Copy link
Member

What do these changes do?

ensure concurrency is kept to the minimum.
Adds an additional exception that is risen when the scheduler is not responding as expected.

Related issue/s

How to test

Dev-ops

@sanderegg sanderegg added this to the Cheops milestone Sep 19, 2025
@sanderegg sanderegg self-assigned this Sep 19, 2025
@sanderegg sanderegg added the a:director-v2 issue related with the director-v2 service label Sep 19, 2025
@codecov
Copy link

codecov bot commented Sep 19, 2025

Codecov Report

❌ Patch coverage is 77.77778% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.79%. Comparing base (16dafee) to head (f924987).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8400      +/-   ##
==========================================
- Coverage   87.91%   87.79%   -0.12%     
==========================================
  Files        1951     1523     -428     
  Lines       75961    63264   -12697     
  Branches     1336      674     -662     
==========================================
- Hits        66779    55545   -11234     
+ Misses       8782     7485    -1297     
+ Partials      400      234     -166     
Flag Coverage Δ
integrationtests 63.94% <75.00%> (-0.02%) ⬇️
unittests 86.19% <77.77%> (-0.38%) ⬇️
Components Coverage Δ
pkg_aws_library ∅ <ø> (∅)
pkg_celery_library ∅ <ø> (∅)
pkg_dask_task_models_library ∅ <ø> (∅)
pkg_models_library ∅ <ø> (∅)
pkg_notifications_library ∅ <ø> (∅)
pkg_postgres_database ∅ <ø> (∅)
pkg_service_integration ∅ <ø> (∅)
pkg_service_library 72.43% <100.00%> (+0.02%) ⬆️
pkg_settings_library ∅ <ø> (∅)
pkg_simcore_sdk 84.99% <ø> (ø)
agent 93.53% <ø> (ø)
api_server 91.96% <ø> (ø)
autoscaling 95.74% <75.00%> (-0.04%) ⬇️
catalog 92.36% <ø> (ø)
clusters_keeper 99.13% <ø> (ø)
dask_sidecar 92.38% <ø> (ø)
datcore_adapter 97.94% <ø> (ø)
director 75.90% <ø> (+0.08%) ⬆️
director_v2 90.92% <75.00%> (-0.01%) ⬇️
dynamic_scheduler 96.27% <ø> (ø)
dynamic_sidecar 90.46% <ø> (ø)
efs_guardian 89.62% <ø> (ø)
invitations 91.44% <ø> (ø)
payments 92.62% <ø> (ø)
resource_usage_tracker 92.18% <ø> (-0.11%) ⬇️
storage 86.74% <ø> (+0.29%) ⬆️
webclient ∅ <ø> (∅)
webserver 87.99% <ø> (-0.02%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 16dafee...f924987. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mergify
Copy link
Contributor

mergify bot commented Sep 19, 2025

🧪 CI Insights

Here's what we observed from your CI run for f924987.

✅ Passed Jobs With Interesting Signals

Pipeline Job Signal Health on master Retries 🔍 CI Insights 📄 Logs
CI integration-tests Base branch is healthy, but retries were needed. Could be early signs of flakiness 👀 Healthy 2 View View
system-tests Base branch is healthy, but retries were needed. Could be early signs of flakiness 👀 Healthy 1 View View
unit-tests Base branch is broken, but the job passed. Looks like this might be a real fix 💪 Broken 0 View View

@sonarqubecloud
Copy link

@sanderegg sanderegg marked this pull request as ready for review September 22, 2025 07:20
@sanderegg sanderegg requested a review from Copilot September 22, 2025 08:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the computational backend stability by reducing connection concurrency and improving error handling. The changes focus on preventing connection overload and ensuring better exception handling for disconnected scheduler scenarios.

  • Reduced maximum concurrent client connections from 10 to 1 to minimize connection load
  • Added exception handling for dask client cancellation errors to improve robustness
  • Enhanced task resource requirements to ensure minimum CPU allocation meets dask worker threading requirements

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
services/director-v2/src/simcore_service_director_v2/modules/dask_client.py Reduces max concurrent connections and adds exception handling for dask client cancellation errors
services/autoscaling/src/simcore_service_autoscaling/modules/cluster_scaling/_provider_computational.py Ensures minimum CPU allocation of 1.0 for tasks to match dask worker threading requirements
packages/service-library/tests/redis/test_semaphore_decorator.py Updates test configurations and adds new test cases for semaphore functionality
packages/service-library/src/servicelib/redis/_semaphore_decorator.py Adds configurable expected lock time parameter to semaphore decorators

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@wvangeit wvangeit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@sanderegg sanderegg merged commit 3ba41bc into ITISFoundation:master Sep 22, 2025
197 of 201 checks passed
@sanderegg sanderegg deleted the computational-backend/stability-improvements-step7 branch September 22, 2025 11:12
@matusdrobuliak66 matusdrobuliak66 mentioned this pull request Sep 24, 2025
65 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

a:computational clusters a:director-v2 issue related with the director-v2 service

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants